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PLANS  FOR  DISCOURSE 

Barbara  J.  Grosz,  Harvard  University 
Candace  L.  Sidner,  BBN  Labs  Inc. 


1  Intentions  and  Actions  in  Discourse  Structures 
Theory 

In  Grosz  and  Sidner  [GS86],  we  proposed  a  theory  of  discourse  structure  comprising 
three  components:  a  linguistic  structure,  an  intentional  structure,  and  an  attentional 
state.  These  three  constituents  of  discourse  structure  deal  with  different  aspects  of 
the  utterances  in  a  discourse.  Utterances-the  actual  saying  or  writing  of  particular 
sequences  of  phrtises  and  clauses-are  the  linguistic  structure’s  basic  elements.  Inten¬ 
tions  and  the  relations  of  domination  and  satisfaction  precedence  provide  the  basic 
elements  of  the  intentional  structure.  Attentional  state  contains  information  about 
the  objects,  properties,  relations,  and  discourse  intentions  that  are  most  salient  at 
any  given  point.  It  is  an  abstraction  of  the  focus  of  attention  of  the  discourse  par¬ 
ticipants;  it  serves  to  summarize  information  from  previous  utterances  crucial  for 
processing  subsequent  ones,  thus  obviating  the  need  for  keeping  a  complete  history 
of  the  discourse. 

In  our  earlier  paper  we  argued  that  the  natural  segmentation  of  a  discourse 
reflects  intentional  behavior;  each  segment  is  engaged  in  for  the  purpose  of  satisfying 
a  particular  intention.  That  intention  is  designated  as  the  discourse  segment  purpose 
(DSP),  i.e.  the  basic  reason  for  engaging  in  that  segment  of  the  discourse.  DSPs 
are  intended  to  be  recognized.  The  utterances  in  a  discourse  provide  information 
necessary  for  a  hearer  or  reader  to  determine  what  the  speaker  or  writer’s  DSPs  are. 
We  raised  a  number  of  questions  about  the  recognition  of  intentions  that  play  this 
key  role  in  the  discourse  and  that  are  present  in  the  intentional  structure  (not  all 
of  the  intentions  expressed  in  utterances  of  the  discourse  appear  in  the  intentional 
structure). 

Our  basic  view  is  that  a  conversational  participant  needed  to  recognize  the  dis¬ 
course  segment  purposes  and  the  dominance  relationships  between  them  in  order 
to  process  subsequent  utterances  of  the  discourse;  the  intentional  structure  is  part 
of  the  context  of  the  discourse.  Although  in  our  previous  paper  we  pointed  out 
a  number  of  kinds  of  information  that  would  play  a  role  in  processing  -  specific 
linguistic  markers,  utterance  level  intentions  and  general  knowledge  about  actions 
and  objects  in  the  domain  of  discourse  -  we  did  not  propose  an  actual  processing 
model.  A  computational  theory  of  the  recognition  of  discourse  segment  purposes 
depends  on  underlying  theories  of  intention,  action,  and  plans.  These  theories  must 
be  appropriate  for  discourse  actions  and  intentions. 

Previous  work  on  planning  and  plan  recognition  for  natural-language  might  seem 
to  provide  the  basis  for  such  theories.  However,  as  we  examined  that  work  we  re¬ 
alized  that  various  assumptions  it  made  about  plans,  actions,  and  agents  were  in- 
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apptopriate  for  the  general  discourM  situation,  and  precluded  any  simple  type  of 
generalization.  In  particular,  it  did  not  provide  the  right  basis  for  explaining  collab¬ 
orative  behavior.  Discourses  are  fundamentally  examples  of  collaborative  behavior. 
The  participants  in  a  discourse  work  together  to  satisfy  various  of  their  individual 
and  joint  needs.  Thus,  to  be  sufficient  to  underlie  discourse  theory,  a  theory  of 
actions,  plans,  and  plan  recognition  must  deal  adequately  and  appropriately  with 
collaboration. 

Discourses  may  exhibit  two  types  of  collaborative  behavior;  collaboration  in  the 
domain  of  discourse  (e.g.,  working  together  to  write  a  paper)  and  collaboration  with 
respect  to  the  discourse  itself.  Although  we  cannot  yet  define  (either  intensionally 
or  extensionally)  “collaboration  with  respect  to  a  discourse”  it  includes  not  only 
surface  collaborations  (e.g..,  coordinating  turns  in  a  dialogue)  or  use  of  appropriate 
referring  expressions  [Gro78,CW86],  but  also  collaborations  related  to  the  discourse 
purpose.  For  example,  the  participants  collaborate  to  ensure  that  the  utterances  of 
the  discourse  itself  provide  sufficient  information  to  make  possible  the  satisfaction 
of  the  discourse  purpose.  We  have  examined,  and  will  discuss  in  this  paper,  the 
sorts  of  plana  and  intentions  involved  in  what  we  called  the  “action”  case  -  roughly, 
the  recognition  of  DSPs  that  embed  in  some  way  intentions  to  perform  actions.  We 
will  thus  foctis  in  this  paper  on  collaboration  in  the  domain.  Searle  [this  volume] 
addresses  similar  issues  concerning  appropriate  theories  for  explaining  how  two  (or 
more)  people  work  together  to  accomplish  goals;  although  his  detailed  propoeals  are 
different,  they  appear  to  be  similar  in  spirit. 

In  this  paper  we  first  examine  the  characteristics  of  the  discourse  situation  and 
the  ways  in  which  they  affect  plan  recognition.  We  then  briefly  review  and  critique 
previous  work  on  plans  and  plan  recognition  for  natural  language.  We  address  two 
particular  concerns:  an  imbalance  in  the  typical  characterization  of  the  speaker  and 
hearer  roles,  and  the  need  to  coordinate  intentions  of  different  agents.  Finally,  we 
propose  a  new  type  of  plan,  one  that  more  naturally  underlies  the  type  of  collabo¬ 
rative  effort  that  dialogues  typically  comprise.  We  discuss  briefly  how  this  type  of 
plan  can  be  used  to  constrain  the  recognition  process. 

2  The  Character  of  Plans  Underlying  Discourse 

At  any  point  in  a  discourse,  a  participant  may  form  and  undertake  a  number  of 
different  plana  Of  all  such  plans,  we  will  be  interested  only  in  those  that  are 
intended  to  be  recognized  by  the  other  discourse  participant;  this  is  much  like  Grice’s 
depiction  of  the  class  of  intentions  underlying  an  utterance  that  are  intended  to  be 
recognized.  As  we  discussed  in  our  previous  paper,  there  is  no  simple  mapping 
between  linguistic  expressions  and  the  intentions  and  plans  underlying  a  discourse. 
No  distinguished  type  of  [linguistic]  expression  is  used  to  convey  information  about 
plans  intended  to  be  recognized. 

For  example,  definite  descriptions  may  convey  intended-plan  information,  or 
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may  be  designed  for  entirely  different  purposes.  In  designing  a  definite  description, 
a  speaker  may  plan  to  add  information  that  aids  a  hearer  in  identifying  an  object 
(cf.  [App85]);  this  plan  is  not  intended  to  be  recognized.  In  contrast,  descriptions 
that  are  conversationally  relevant  [Kro86]  arc  realizations  of  plans  that  are  intended 
to  be  recognized.  Likewise  there  are  plans  for  sequences  of  utterances  only  some  of 
which  are  intended  to  be  recognized.  For  instance,  a  speaker  may  plan  to  convey 
the  information  in  a  discourse  segment  in  a  certain  sequence  (conventionally  used 
to  convey  such  information)  without  intending  that,  the  hearer  recognize  this  plan 
(cf.  [McK83j).  Finally,  in  some  discourses,  a  speaker  may  intend  that  his  plan  not 
be  recognized,  because  its  recognition  would  foil  his  goals  (e  g.,  the  socially-oriented 
plans  discussed  by  Hobbs  and  Evans  |H£80j). 

Plan  recognition  is  the  process  of  inferring  an  actor’s  plan  on  the  basis  of  partial 
information  about  a  portion  of  it.  Plan  recognition  for  discourse  concerns  the  recog¬ 
nition  of  plana  that  are  intended  to  be  recognized.  This  simple  definition,  when  put 
into  practice,  is  colored  by  a  multitude  of  issues.  Some  of  these  are  foundational 
questions  about  the  nature  of  a  plan.  Is  a  plan  a  collection  of  actions  that  an  actor 
is  about  to  undertake?  Is  it  a  collection  of  an  actor’s  intentions  and  belief  to  act 
in  some  way?  Can  a  plan  include  actions  performed  by  other  agents  or  refer  to 
beliefs  held  by  another  agent?  Other  questions  concern  the  conditions  under  which 
a  particular  plan  is  inferred.  What  is  the  relation  between  the  actor  of  a  plan  and 
an  inferring  agent  (i.e.  the  agent  who  is  inferring  the  actor’s  plan)?  Does  the  actor 
know  that  he  is  being  observed?  Is  there  any  attempt  on  the  part  of  the  actor  to 
insure  that  *he  inferring  agent  has  all  the  information  needed  to  infer  the  plan?  How 
do  the  actor  and  inferring  agent  share  information  about  the  plan? 

The  communicative  situation  exerts  strong  constraints  on  the  plan  recognition 
problem  for  natural-language  processing.  Each  discourse  participant  undertakes 
plans  to  accomplish  his  own  desires,  and  collaborates  in  plans  to  achieve  the  desires 
of  other  participants.  Discourse  participants  are  thus  both  actors  and  inferring 
agents  involved  in  the  recognition  of  each  other’s  plans.  As  we  will  show  later, 
collaborative  plans  play  a  prominent  role  in  discourse;  their  construction  and  use 
require  that  participants  make  clear  to  one  another  how  their  actions  will  coordinate 
and  contribute  to  the  satisfaction  of  the  discourse  purpose.  Thus,  speakers  must 
provide  in  their  utterances  sufficient  information  about  their  beliefs  and  intentions 
for  their  hearers  to  be  able  to  determine  how  these  contribute  to  the  (collaborative) 
plan,  and  hearers  must  be  attuned  to  those  cues  of  language  as  well  as  to  properties 
of  the  discourse  situation  that  constrain  their  inference  of  the  plan. 

Various  linguistic  devices  provide  explicit  information  about  intentions;  of  these, 
the  most  extensively  considered  have  been  cue  phrases  (GS86,PS83,Rei84]  and  into 
nation  [HP86,HL87].  For  example,  speakers  can  use  such  devices  to  tell  their  hearers 
when  they  complete  a  discourse  segment  (reflecting  a  belief  that  they  have  said  all 
that  needs  to  be  said  to  satisfy  its  DSP)  and  are  moving  onto  another  DSP  and  seg¬ 
ment.  They  also  may  use  them  to  signal  the  temporary  interruption  of  one  segment 
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(and  the  attempt  to  satiafy  its  DSP)  so  that  they  may  pursue  another  unrelated 
(but  momentarily  more  important)  DSP. 

Furthermore,  although  discourse  participants  may  hold  a  wide  range  of  mutual 
beliefs,  each  also  has  private  beliefs.  None  has  either  complete  or  perfect  infor* 
matioo,  and  in  general  their  beliefs  may  differ.  In  particular,  the  knowledge  that 
discourse  participants  bring  to  the  discourse  about  the  plans  of  others  is  incomplete, 
and  their  beliefs  about  how  actions  can  be  combined  to  achieve  desired  states  is  often 
different.  Typically  the  information  needed  to  infer  the  plan  of  another  discourse 
participant  is  conveyed  not  in  a  single  utterance,  but  in  a  sequence  of  utterances. 
Thus,  the  plan  recognition  process  for  discourse  entails  incremental  recognition  on 
the  basis  of  partial  information,  accomodation  of  uncertainty  (e.g.,  treating  dis¬ 
junctive  possibilities),  and  strategies  for  resolving  inconsistencies  in  beliefs  among 
participants  [Pol86|. 

Finally,  two  types  of  actions  may  be  performed  by  participants  in  a  discourse: 
domain  actions  and  communicative  actions.  Domain  actions  are  those  actions  that 
change  the  world  directly.  Cortununicative  actions,  accomplished  by  utterances, 
directly  affect  the  beliefs  of  the  discourse  participants  (and  may  through  this  lead 
to  domain  actions  that  affect  the  outside  world).  They  may  also  affect  the  state 
of  the  discourse;  for  example,  change  the  attentional  state  by  pushing  or  popping 
focus  spaces  or  by  introducing  new  entities  into  a  space. 

3  Plans  and  Plan  Recognition  Algorithms  Thus  Far 

Some  of  the  assumptions  underlying  prior  work  on  plan  recognition  for  natural- 
language  processing  have  differed  from  the  characterization  of  discourse  we  have  just 
sketched.  Typically,  it  has  been  assumed  that  one  agent  had  desires  and  produced 
utterances  {the  speaker)  and  the  other  agent  {the  hearer)  attempted  to  infer  from 
these  utterances  the  speaker’s  goals  and  plans;  we  will  dub  this  the  mtuter-elave 
(ueumption.  In  addition,  it  has  been  assumed  that  the  inferring  agent’s  knowledge 
of  actions  and  how  they  are  related  constitute  a  correct  and  complete  description 
of  what  agents  can  do.  Furthermore,  the  predominant  representation  of  plans  has 
been  one  originally  developed  for  planning  by  a  single  agent  who  is  situated  in  a 
world  that  only  changes  as  a  result  of  her  own  actions.  In  this  section  we  briefly 
review  the  main  constructs  used  in  prior  work,  critique  their  use  as  the  basis  for 
plan  recognition  in  discourse,  and  discuss  which  representations  and  processes  can 
be  adapted  to  support  the  kind  of  communicative  situation  that  we  have  in  mind. 

In  the  past  ten  years  a  number  of  AI  researchers  have  explored  issues  concerned 
with  the  representation  of  plans  and  actions,  and  algorithms  for  inferring  one  par¬ 
ticular  plan  on  the  basis  of  partial  information  [Bru75,BN78,SSG79,All79,A?80, 
Sid83,Sid85,KA86].  In  the  natural-language  processing  work  on  plan  recognition,  a 
speaker  (filling  the  actor  role)  engages  a  hearer  (the  inferring  agent)  in  discussion 
about  actions  and  conditions  that  the  speaker  desires.  The  speaker  may  want  the 
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hearer  to  do  some  specific  act  (e.g.,  to  flip  the  living  room  light  switch  to  turn  on 
the  living  room  lights)  or  to  do  whatever  act  will  produce  a  specific  effect  (e.g.  make 
it  light  in  the  living  room).  The  speaker’s  utterances  serve  the  purpose  of  telling 
the  hearer  the  particular  act  or  effect,  and  possibly  some  other  information.  The 
hearer  (as  inferring  agent)  is  assumed  to  be  ready  to  carry  out  the  specific  act  or 
produce  the  effect  once  it  is  clear  what  it  is.  This  research  has  also  assumed  that  the 
actor  (speaker)  was  aware  of  the  inferring  agent  (hearer)  and  intended  for  the  infer> 
ring  agent  to  draw  certain  conclusions  about  the  actor’s  plan  (called  the  intended 
recognition  assumption  [CohSlj). 

Almost  all  of  the  work  on  plan  recognition  algorithms  has  been  based  on  the 
same  representational  formalism,  namely  that  developed  for  STRIPS  [FN71j  and  its 
descendants  (e.g.,  NOAH  ’Sac77|).  In  this  formalism  operators  are  used  to  model 
actions,  where  an  operator  comprises  three  parts:  a  description  of  an  action  in 
terms  of  subactions  (the  body)^,  a  precondition  needed  to  be  true  to  carry  out  the 
action,  and  an  effect  that  holds  once  the  action  is  accomplished.  Because  the  body 
of  an  operator  could  contain  subactions,  the  operators  could  in  principle  express 
decompositions  of  actions  into  other  actions.  A  plan  was  an  assembly  of  operators 
that  described  bow  to  get  from  an  initial  state  to  a  final  state  (called  the  goal).  In 
both  STRIPS  and  NOAH,  operators  were  actually  schema  for  a  class  of  actions;  for 
example,  the  operator  Pickup(a  x)  described  the  class  of  actions  that  included  such 
instances  as  Pickup(Johnnie  redtruck)  and  Pickup(Robotl  screw2).  The  operators 
were  not  in  subsumption  hierarchies;  no  mechanism  existed  to  express  the  relations 
among  operator  classes  (e.g.  that  the  operator  for  transfer  of  objects  by  agents 
subsumes  the  operators  for  giving,  taking,  stealing,  dropping  off,  etc). 

The  STRIPS  formalism  was  developed  for  planning  purposes;  it  had  to  be 
adapted  in  several  ways  before  it  could  be  used  for  plan  recognition.  Tc  recon* 
struct  the  plan  of  another  agent,  recognition  processes  used  heurisitic  rules  that 
indicated  how  an  agent’s  desires  could  be  linked  to  preconditions,  bodies,  or  effects 
of  actions.  .Mien’s  system  used  operator  definitions,  the  bodies  of  which  contained 
at  most  single  actions.  Sidner  and  Kautz  each  augmented  the  formalism  to  include 
subsumption  hierarchies  over  both  the  operators  as  a  whole  and  the  decomposi* 
tions  of  actions  within  the  body  of  an  operator.  In  addition,  their  operator  bodies 
typically  include  sequences  of  multiple  subactions. 

Plan  recognition  work  for  language  processing  has  proposed  various  explanations 
for  why  hearers  need  to  infer  a  speaker’s  plan  and  how  they  do  so.  In  their  pioneer¬ 
ing  work  on  speech  acts  and  plan  recognition,  Allen  and  Perrault  [APSO]  assumed 
that  both  the  speaker’s  goal  and  his  plan  for  satisfying  that  goal  were  unknown  to 
the  hearer.  They  defined  a  recognition  process  for  inferring  a  speaker’s  goal  and 
plan;  it  used  information  from  a  single  utterance  combined  with  (presumed  shared) 
knowledge  of  possible  plans.  The  inferred  plan  comprised  a  combination  of  commu- 
'  nicative  actions  and  domain  actions.  It  reflected  the  hearer’s  reuoning  about  how 
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the  speech  act  was  related  to  the  speaker’s  desire.  Allen  and  Perrault  also  showed 
how  the  plan  provided  the  context  in  which  to  determine  an  appropriate  response. 
In  particular,  after  inferring  the  speaker’s  goal  and  his  plan  for  achieving  that  goal, 
a  cooperative  hearer  would  provide  information  that  was  missing  in  the  plan  and 
needed  in  order  for  it  to  be  carried  out  by  the  speaker. 

In  Sidner’s  work,  the  goal  and  plan  also  were  inferred,  but  increment.  over 
successive  utterances  of  the  conversation.  Sidner  augmented  Allen’s  original  frame¬ 
work  by  concentrating  on  the  recognition  of  complex  descriptions  of  actions  and  on 
the  multi-utterance  nature  of  discourse.  According  to  her  theory,  a  hearer  was  to 
accomplish  whatever  specific  actions  a  speaker  had  conveyed  as  desired.  Each  utter¬ 
ance  was  viewed  as  providing  partial  information  about  the  speaker’s  plan.  Thus, 
after  each  utterance  the  hearer  was  considered  to  have  a  partial  description  (which 
we  will  call  the  hearer’s  action  description)  of  the  speaker’s  plan;  information  in 
subsequent  utterances  enabled  the  hearer  to  refine  the  action  description.  Since  ac¬ 
tions  were  modeled  in  an  abstraction  (subsumption)  hierarchy,  plan  recognition  was 
taken  to  be  a  process  of  recognizing  a  more  specific  goal  by  deriving  more  specific 
action  descriptions  from  the  abstraction  hierarchy.  The  specific  action  description 
inferred  at  the  end  of  discourse  (segment)  was  considered  to  be  the  speaker’s  plan. 

To  illustrate  how  to  use  the  refined  action  description,  we  will  consider  the 
following  simple  example.  Someone  says  “I’m  going  on  a  date  tonight.  Can  you 
pick  up  something  at  the  florist  for  me?”  In  this  example,  recognition  is  simplified 
because  the  speaker  makes  explicit  the  (domain)  desire  (to  go  on  a  date)  that  leads 
to  his  secondary  desire  that  the  hearer  do  a  specific  action  (get  something  &om  the 
florist)  that  will  aid  in  the  satisfaction  of  his  primary  desire.  The  speaker  intended 
that  the  hearer  would  recognize  that  the  florist  visit  is  in  aid  of  the  speaker’s  plan 
for  meeting  the  date-desire;  thus  the  action  of  visiting  the  florist  is  to  obtain  flowers 
for  the  date.  Furthermore,  the  hearer  is  intended  to  use  this  information  to  choose 
flowers  appropriate  to  the  occasion  (red  roses  rather  than  a  potted  plant). 

Plan  recognition  can  be  much  more  complex,  when  it  requires  refinement  over 
several  utterances  of  a  discourse  without  a  direct  statement  of  what  the  speaker  was 
up  to.  If  a  speaker  asked  a  hearer  to  get  his  good  suit  from  the  cleaners,  and  then 
a  while  later  asked  for  something  from  the  flo.’-ist,  and  that  the  car  be  washed  and 
filled  with  gas,  the  hearer  could  again  infer  that  the  speaker  was  about  to  go  on  a 
date.  However,  in  this  case  an  incremental  search  of  the  action  abstraction  hierarchy 
would  be  undertaken.  The  first  utterance  provides  a  piece  of  information  to  infer 
that  the  speaker  may  be  getting  dressed  up  to  go  somewhere;  the  later  utterances 
provide  the  additional  information  needed  to  conclude  the  more  specific  plan  is  to 
go  on  a  date. 

Kautz’s  general  theory  of  plan  recognition  redefined  the  plan  recognition  process 
as  deduction  based  on  a  set  of  observations,  an  action  taxonomy,  and  one  or  more 
simplicity  conditions  (AAAI86,  p.l23).  The  general  criteria  underlying  his  algo¬ 
rithm  include  that  two  or  more  actions  may  be  interleaved,  and  that  an  action  can 
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simultaneously  be  part  of  more  than  one  action  description.  His  theory  makes  no 
specific  assumptions  about  communication  between  a  speaker  and  a  hearer.  Thus 
the  recognition  algorithm  takes  the  view  of  observing  actions  without  the  actor  of 
the  plan  having  awareness  of  the  inferring  agent’s  presence  (caJled  keyhole  recogni* 
tion  [CohSl]].  Kautz’s  model  takes  a  more  general  view  of  plan  recognition  than 
previously  done. 

However,  Kautz’s  model  includes  some  more  restrictive  assumptions  as  well.  It 
assumes  that  recognition  is  undertaken  with  a  complete  list  of  observed  actions.^ 
The  model  also  incorporates  three  important  limiting  assumptions  about  the  rep¬ 
resentation  of  actions:  (1)  the  specialization  hierarchy  encodes  a  complete  and  mu¬ 
tually  exclusive  set  of  specializations;  (2)  the  decomposition  hierarchy  is  complete; 
(3)  if  two  observed  actions  might  be  part  of  one  plan,  they  are  taken  to  be  part  of 
the  same  plan.  [(3)  is  called  the  simplicity  condition.) 

Assumptions  (1)  and  (2)  were  also  made,  explicitly  or  implicitly,  in  ail  work 
prior  to  Kautz’s.  These  assumptions  are  problematic  for  plan  recognition  applied 
to  discourse  because  the  participants  operate  with  incomplete  knowledge  of  one 
another.  Pollack  [Pol86j  argues  this  case  quite  clearly  in  considering  appropriate 
responses  to  questions. 

Assumption  (3)  has  been  made  ^  as  a  means  of  limiting  the  observer’s  incremental 
search  for  the  most  general  plan  the  observed  agent  is  pursuing.  This  assumption 
limits  search  by  constraining  the  number  of  possible  plana.  It  thus  helps  in  those 
cases  in  which  actions  do  fit  together.  However,  it  offers  no  special  help  in  those 
cases  in  which  two  (sequentially  observed)  actions  are  not  part  of  the  same  plan. 

The  fact  that  communication  in  natural  language  rests  in  part  on  an  assumption 
of  intended  recognition  allows  for  a  modified  form  of  assumption  (3),  which  aids 
communication  In  both  action  cases:  a  speaker  must  mark  those  cases  where  two 
actions  are  not  part  of  the  same  plan.  By  marking  such  shifts,  a  speaker  provides 
the  information  needed  to  reduce  the  incremental  search  when  two  actions  do  not  fit; 
combined  with  the  assumption  of  intended  recognition,  it  justifies  a  hearer  assuming 
in  the  absence  of  such  markings  that  two  actions  are  intended  to  fit.  Thus,  plan 
recognition  for  natural  language  is  more  constrained  than  the  general  (keyhole) 
recognition  case  considered  by  Kautz.* 

In  addition  to  those  problems  just  discussed,  previous  plan  recognition  work  has 
had  two  major  problems  (pointed  out  by  Pollack  in  ^PolSej).  First,  the  view  of  plans 
as  being  composed  solely  of  collections  of  actions  (and  their  associated  preconditions 
and  effects)  is  insufficient.  The  definition  of  a  plan  must  account  for  the  ways  in 
which  the  intentions  of  the  agent  who  is  (about)  to  perform  the  actions  and  his 


’Kauti  has  eoatidarad  iacrtmantai  al(orithmt,  bat  it  is  aacJaar  jatt  how  theM  diffar  from  tha 
baaie  algorithm,  i.a.,  whan  tba  complata  Hat  of  actiona  ia  avaUabla. 

'It  haa  bata  mada  in  aU  work  that  attamptad  to  traat  tha  poaaibility  of  multipla  plant  being 
punned  limultaaeonaly. 

'Cohan  (Coh83|  propotaa  a  limilar  rola  for  cue  phrMse  in  limiting  laarch  for  dariving  tha  ttme- 
tura  of  argumanta.  Sidnar  ,Sid8S|  and  Litman  iLitSSj  oakaaimilar  claima  for  plan  racognition. 
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beliefs  about  those  actions  affect  the  appropriateness  and  success  of  the  plan. 

The  current  state  of  plan  recognition  research  derives  in  part  from  the  nature  of 
the  tasks  addressed  by  STRIPS-type  systems  (namely  those  involving  robots),  and 
in  part  from  a  particular  set  of  natural  language  domain  tasks  (namely,  ones  that 
reflected  the  master/slave  assumption).  In  such  settings  it  might,  at  first  glance, 
seem  possible  to  ignore  the  intentions  of  the  planning  agent,  and  the  ways  in  which 
the  beliefs  of  the  planning  and  inferring  agents  may  differ.  However,  for  natural 
language  more  than  a  single  agent  may  be  involved  in  carrying  out  a  plan  and  more 
than  one  agent  must  have  access  to  knowledge  about  the  plan.  Thus  any  model 
(or  theory)  of  the  communicative  situation  must  distinguish  among  the  beliefs  and 
intentions  of  different  agents. 

A  second  problem  with  prior  plan  recognition  (and,  as  it  turns  out,  planning) 
models  is  the  underlying  model  of  actions  on  which  they  rest.  As  Pollack  [Pol86j  has 
shown,  the  notions  of  precondition,  body,  and  effect  have  been  used  to  encode  a  va* 
riety  of  different  types  of  relationships  in  different  ways  on  different  occasions.  They 
are  not  well-defined  either  theoretically  or  in  practice.  For  example,  in  the  STRIPS- 
type  formalisms,  opening  a  door  can  be  descibed  by  an  action  operator  with  header 
Open(agent,  door),  precondition  (Not-Open(door)),  effect  (Open(door)),  and  body 
[Put(agent,Hand-On-Knob(door)),Turn(agent,knob(door)),Pull-Knob(agent,door)]. 
This  description  fails  to  encode  information  such  as  which  actions  enable  other  ac¬ 
tions,  which  actions  must  stand  in  a  sequence,  which  actions  actually  accomplish 
the  end  action  and  which  are  supplementary,  and  what  relation  preconditions  and 
effects  bear  to  the  subactions  of  the  action  operator.  As  Allen  (All84|  has  remarked, 
the  formalisms  do  not  provide  a  natural  description  of  simultaneous  action  nor  treat 
goals  of  maintenance  (i.e.,  desires  that  certain  properties  of  the  current  state  of  the 
world  be  maintained;  e.g.,  the  desire  to  stay  healthy).  The  STRIPS  formalism  has  no 
calculus  of  these  aspects  of  actions;  prior  plan  recognition  research  has  not  provided 
it. 

Pollack  redefined  plana  in  order  to  explain  a  type  of  language  behavior  involving 
errors  in  speaker’s  plana.  She  defined  plans  as  mental  states  of  agents,  i.e.  as 
a  particular  set  of  their  intentions  and  beliefs.  An  agent’s  (speaker  or  hearer) 
simple  plan^  was  defined  in  terms  of  a  set  of  beliefs  and  intentions:  beliefs  about 
the  relations  among  various  intended  actions,  and  about  the  executability  of  those 
actions;  and  intentions  (of  the  agent)  regarding  those  actions. 

To  infer  the  speaker's  plan.  Pollack  pursued  a  special  case  of  plan  recognition 
for  her  natural-language  examples.  Given  a  stated  speaker  desire  and  a  stated 
action  that  was  to  generate  additional  (unspecified)  actions  to  achieve  the  desire, 
the  plan  recognizer  found  a  path  between  the  desire  and  stated  action  by  filling  in 
the  unspecified  generated  actions.  This  kind  of  plan  recognition  algorithm  was  not 
a  departure  from  the  earlier  work,  but  it  made  use  of  a  very  different  formalism  for 

*A  limpU  plan  rslates  actions  only  by  ths  rtiatioa  of  {snsratioa  [GolTOj;  enabling  relations 
among  actions  remain  to  be  examined  in  future  work. 
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a  plaa. 

Pollack’s  plan  formalism  allowed  her  to  make  a  new  distinction;  the  actor’s  plan 
to  achieve  some  P  and  the  inferring  agent’s  own  (and  possibly  different)  description 
of  how  to  achieve  P.  Once  the  actor’s  plan  was  inferred,  the  inferring  agent  could 
inspect  it  to  determine  which  of  the  (actor’s)  beliefs  in  the  plan  differed  from  her  own 
beliefs  about  domain  actions.  These  differences  could  form  the  basis  of  a  response 
that  suggested  to  the  actor  a  more  appropriate  set  of  actions  for  achieving  his  goal.^ 

Pollack’s  definitions  of  intentions  and  of  the  simple  plan  of  an  agent  provide  a 
much  richer  and  cleaner  model  of  an  agent’s  plan  to  achieve  some  desire  on  the 
basis  of  a  simple  action  or  sequence  of  actions.  The  richness  originates  with  the 
addition  of  intentions,  and  beliefs  about  execution  and  generation  among  actions. 
Her  model  clearly  distinguishes  among  believing  that  actions  fit  together  in  certain 
regular  ways,  believing  that  one  can  execute  those  actions,  and  actually  intending 
to  act. 

Pollack’s  definition  of  plans  has  turned  out  to  be  most  useful  to  us  for  discourse 
theory  because  it  rests  on  a  detailed  treatment  of  the  relations  among  actions  (rela¬ 
tions  of  generation  and  enablement)  and  because  it  distinguishes  the  intentions  and 
beliefs  of  an  agent  about  those  actions.  Since  her  plan  model  is  the  simple  plan  of 
a  single  agent,  we  need  to  extend  the  model  to  plans  of  two  or  more  collaborative 
agents.  Extension  to  plans  involving  enabling  as  well  as  generating  actions  will  await 
another  paper. 

4  Shared  Plans 

Shared  Plans  are  a  notion  intended  to  remedy  several  problems  we  mentioned  above; 
the  tendency  of  existing  work  to  make  the  master-slave  assumption,  the  embedding 
of  intended  actions  in  the  context  “speaker  intends  hearer  to  intend”  in  describing 
the  speaker  plans  that  are  to  be  inferred,  and  the  frequent  failure  to  distinguish 
between  building  an  agent  that  did  plan  recognition  and  providing  a  deKription  of 
the  state  in  which  recognition  occurs. 

In  our  previous  paper  we  pointed  out  that  discourse-segment  purposes  (DSPs) 
are  a  natural  extension  of  Gricean  intentions  at  the  utterance-level.  In  extending 
Grice’s  definitions  to  the  discourse  level  for  the  action  case  we  argued  that  DSPs 
were  of  the  form  Intend(ICP,  Intend(OCP,  Do(A))...)  where  ICP  is  the  discourse 
participant  who  initiates  the  segment,  OCP  is  the  other  participant,^  and  the  ellipsis 
includes  subordinate  intentions,  not  crucial  for  the  point  at  band.  This  definition 
was  a  natural  extension  of  work  on  utterance- level  intention  recognition  that  linked 

'Pollack’*  work,  Uk*  all  prtviou*  work,  a**um**  that  the  inferring  agtnt  ha*  complete  and  accu¬ 
rate  knowledge  of  domain  action*. 

^  ^W*  introduced  thee*  t«rm*  becau**  either  participant  can  be  a  epeaker  of  other  ntterancee  in  the 

r  eegment  and  hence  the  ueoal  [maeter-ilave  aaeumptionj  use  of  Speaker  and  Hearer  to  differentiate 

^  rolee  will  not  work. 
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&  speaker’s  desires  with  a  hearer’s  action  or  intention  to  act  (e.g.,  Allen’s  Nested- 
Planning  Rule  includes  expressions  of  the  form  Want(Speaker  Want(Hearer  P))). 

Although  the  definition  of  DSPs  seems  approximately  right,  tying  it  to  plan 
recognition  and  plan  recognition  algorithms  requires  a  definition  of  what  it  would 
mean  for  one  agent  (ICP)  to  intend  that  another  agent  (OCP)  do  (or  intend  to  do) 
something.  The  usual  notion  of  intention  cannot  be  extended  naturally  to  cover  this 
case.  Although  previous  work  on  plan  recognition  (at  the  utterance  level)  uses  such 
a  notion,  it  presumes,  rather  than  provides,  a  definition.  Furthermore,  there  have 
been  strong  philosophical  arguments  that  intention  is  a  first-person  attitude,  i.e., 
that  the  objects  of  intention  are  actions  of  the  intending  agent  (e.g.,  [Bra83,Dav79]). 

Second,  serious  consideration  of  dialogue  makes  it  clear  that  the  master-slave 
assumption  is  the  wrong  basis  on  which  to  build  a  theory  of  discourse.  This  assump¬ 
tion  encourages  theories  that  are  unduly  oriented  toward  there  being  one  controlling 
agent  and  one  reactive  agent.  Only  one  agent  has  any  control  over  the  formation  of 
the  plan;  the  reactive  agent  is  involved  only  in  execution  of  the  plan  (though  to  do 
so  he  must  first  figure  out  what  that  plan  is).  We  conjecture  that  the  focus  of  speech 
act  and  plan  recognition  work  on  single  exchanges  underlies  its  (implicit)  adoption 
of  the  master-slave  assumption.  To  account  for  extended  sequences  of  utterances, 
it  is  necessary  to  realize  that  two  agents  may  develop  a  plan  together  rather  than 
merely  execute  the  existing  plan  of  one  of  them.  That  is,  language  use  is  more 
accurately  characterized  as  a  collaborative  behavior  of  multiple  active  participants. 

Finally,  language  use  is  not  the  only  form  of  cooperative  behavior  which  requires 
a  notion  of  shared  plans.  A  variety  of  nonlinguistic  action?  and  plans  cannot  be 
explained  solely  in  terms  of  the  private  plans  of  individual  agents  (cf.  Searle,  this 
volume).  For  example,  consider  the  situation  portrayed  in  Figure  1.  Two  children 
each  have  a  pile  of  blocks;  one  child’s  blocks  are  blue,  the  other’s  green.  The 
children  decide  to  build  together  a  tower  of  blue  and  green  blocks.  It  is  not  the 
case  that  their  plan  to  build  this  tower  is  any  combination  of  the  first  child’s  plan 
to  build  a  tower  of  blue  blocks  with  some  empty  spaces  (in  just  the  right  places 
to  match  the  other  child’s  plan)  and  the  second  child’s  plan  to  build  a  tower  of 
green  blocks  with  some  empty  spaces  (again  in  just  the  right  places).  Rather,  they 
have  some  sort  of  joint  plan  that  includes  actions  by  each  of  them  (the  first  child 
adding  blue  blocks,  the  second  green  ones).  *  In  a  more  practical  vein,  the  concept 
of  shared  plans  provides  a  foundation  for  theories  of  collaborative  behavior  that 
could  provide  for  more  flexible  and  fluent  interactions  between  computer  systems 
and  users  undertaking  joint  problem-solving  activities  (e.g.,  systems  for  diagnosis). 


Thif  txampla  providit  an  axtremdy  productiva  analog  for  modaUing  dialogua.  Each  uttaranca 
or  aagmant  is  Lika  a  block,  placad  by  tha  participant  (buildar)  on  tha  axiiting  atnictura  (diacourta 
or  towar)  to  axtand  it  in  wayi  that  maka  halp  acbiava  tha  original  purpoaa.  A  major  diffaranca 
howavar,  ia  that  tha  towar  ia  an  and  in  itaalf  wharaaa  tha  diacouraa  U  a  maana  to  achiava  tha 
diacourta  purpoaa. 
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5  Shared  Plans  in  Discourse 

To  account  for  the  collaborative  behavior  we  believe  is  manifest  in  discourse,  we  will 
define  a  new  construct,  that  of  two  agents  having  a  shared  plan.  The  definition  is 
based  on  Pollack’s  definition  of  a  single  agent  having  a  simple  plan.^  Like  Pollack, 
we  will  adopt  Allen’s  interval- baaed  temporal  logic  as  the  basic  formalism  for  rep¬ 
resenting  actions.  We  will  use  Pollack’s  modification  of  the  predicate  representing 
the  occurence  of  an  action,  OCCURS:  the  predication  OCCURS  (a,G,t)  is  true  if 
and  only  if  the  act-type  a  is  performed  by  G  during  time  interval  t. 

Pollack  defined  the  SimplePlan  of  the  single  agent  as  follows; 

SimplePlan(G,  ar„,  [ai,. .  Qn-ij,  t2,  tl)  <=» 

1.  BEL(G,  EXEC(a,-,  G,  t2),  tl)  for  i=l.  . . .,  n-1  ie 

2.  BEL(G,  GEN(o,',  oit.«-ii  t2]  for  i=l,  .  n-1  ii 

3.  INT(G,  a,-,  t2,  tl)  for  i=l,  . . .,  n-l 

4.  INT(G,  BY(otv,  a,'4.i),  t2,  tl)  for  i=l  .. .,  n-1 

Thus,  the  four  main  clauses  in  Pollack’s  schema  concern  (1)  an  agent’s  beliefs 
about  executablility  of  his  or  her  actions,  (2)  an  agent’s  beliefs  about  generation 
relationships  between  actiona^°,  (3)  intentions  of  the  agent  to  do  actions,  and  (4) 
intentions  for  the  actions  being  done  to  play  a  role  in  the  plan  itself.  In  general  for 
shared  plans  we  modify  her  schema  as  follows 

SharedPlan  (Gl  G2  A)  <=^ 

1.  .V1B(G1  G2  (EXEC  (action,,  G,-  )) 

2.  MB(  . ) 

3.  MB(G1  G2  (INT  (G„  action,  ))) 

4.  .MB(G1  G2  (INT  G„  BY  (action,,  A))) 

5.  INT(Gi,  action,) 

6.  INT(G,-,  BY  (action,.  A)). 

•This  meaaf  for  the  moment  we  will  only  coneider  eetione  reUted  by  generation;  we  will  diecuee 
the  extension  to  enabling  reiationebipe  Uter. 

‘“la  an  extended  model  of  plane  and  actions,  other  types  of  relationihipe  between  actions  (e  g., 
enabling  rslatlonehipa)  would  be  included  here. 

“We  are  leaving  out  the  time  parameters  for  the  moment,  but  will  include  them  below  in  those 
cases  where  certain  of  their  properties  are  important. 
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The  index  j  ranges  over  all  of  the  acts  involved  in  doing  A;  for  each  actiony  one  of 
the  agents,  Gl  or  G2,  is  the  agent  of  that  action.  That  is,  the  action  consisting  of  the 
act-type  action,,  done  by  agent  Gl  or  G2  (as  appropriate),  at  time  t  contributes  to 
Gl  and  G2’b  plan  to  accomplish  A.  Like  Pollack,  we  will  use  the  constructor  function 
ACHIEVE  to  turn  properties  (i.e.,  states  of  affairs)  into  act-types.  If  Gl  and  G2 
construct  a  SharedPlan  to  have  a  clean  room,  we  will  say  there  is  a  SharedPlan(Gl 
G2  Achieve(CIean-room)). 

The  content  of  clause  (2)  depends  on  the  types  of  actions  being  done.  We  will 
consider  four  key  classes  of  SharedPlans  here;  those  involving  simultaneous  actions 
by  two  agents,  conjoined  actions  by  two  agents,  sequence  of  actions  by  two  agents, 
and  those  involving  actions  by  only  one  agent. 

This  definition  differs  from  Pollack’s  in  two  ways;  the  beliefs  about  relations 
among  actions  are  mutual  beliefs,  and  different  agents  may  perform  different  of  the 
actions.  Because  different  agents  may  be  involved  in  acting,  it  becomes  necessary  to 
add  that  there  is  mutual  belief  among  all  participants  about  one  another's  intentions 
and  about  the  way  in  which  those  intentions  support  the  achievement  of  the  overall 
goal.  Notice  that  this  means  that  a  SharedPlan  is  not  simply  the  mutual  belief  of 
one  (or  two)  SimplePlans. 

It  may  seem  that  mutual  belief  is  too  strong  a  demand  on  the  discourse  because 
not  all  of  the  intentions  and  actions  in  the  SharedPlan  are  (necessarily)  made  public 
by  the  utterances  in  the  discourse.  The  very  fact  that  both  participants  know  they 
are  are  constructing  a  SharedPlan  obviates  this  difficulty.  It  allows  a  discourse  par¬ 
ticipant  to  infer  those  mutucl  beliefs  needed  for  the  SharedPlan  but  not  mentioned 
(provided  he  does  not  have  information  to  the  contrary)  and  to  assume  that  other 
participants  will  do  the  same. 

The  SharedPlan  thus  provides  a  key  piece  of  the  puzzle  of  defining  relevance  in  a 
discourse.  One  of  its  functions  is  to  distinguish  from  runong  all  those  mutual  beliefs 
not  explicitly  mentioned  in  the  discourse  the  ones  that  are  relevant  to  the  discourse; 
they  are  those  that  play  a  role  in  the  SharedPlan.  The  SharedPlan  is  constructed 
from  a  combination  of  those  beliefs  and  intentions  explicitly  mentioned  and  those 
prior  mutual  beliefs  selected  on  the  basis  of  the  need  to  construct  the  SharedPlan. 
Any  belief  needed  for  there  to  be  a  plan,  but  not  mentioned,  is  a  relevant  prior  belief. 
.\ny  belief  that  cannot  be  inferred  on  the  basis  of  what  has  been  made  explicit  and 
on  prior  beliefs  must  be  made  explicit  or  inferrable. 

It  has  been  argued  (e.g.,  Sperber  and  Wilson  [SW86])  that  mutual  belief  is  not 
the  appropriate  relation  for  communication.  A  central  part  of  this  argument  is, 
roughly,  that  the  participants  do  not  need  to  have  identical  beliefs,  and  furthermore 
there  is  no  reason  to  believe  that  people  actually  do  have  identical  beliefs.  However, 
in  the  case  of  a  SharedPlan  mutual  belief  is  crucial  to  action;  multiple  agents  cannot 
act  with  any  assurance  unless  there  is  such  mutual  belief  (HM84]. 

We  are  still,  however,  left  with  the  question  of  how  the  participants  come  to  agree 
to  construct  a  SharedPlan.  We  believe  this  depends  on  a  conversational  rule  similar 
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CDRl:  MB(G1,  G2.  Desire(Gl,  P)  k 

Coop*r»nve(Gl,  G2)  <k 
Communic»ting(Gl,  G2,  Desire(Gl,  P)))  => 

MB(G1,  G2,  De3ire(Gl,  Achieve(SharedPlan(Gl,  G2,  Achieve(P))))) 


Figurt  2:  ConversationalDefaultRula  1 

to  Grice’s  conversational  principles.  The  rule  operates  in  the  absence  of  evidence  to 
the  contrary,  i.e.,  it  is  a  default  rule.  One  of  the  conditions  under  which  this  rule 
will  not  apply  is  if  it  is  mutually  believed  that  agent  G1  can  achieve  P  on  her  own. 
The  rule  stipulates  only  that  there  will  be  mutual  belief  of  a  desire  to  achieve  a 
ShairedPlaa;  to  move  from  this  to  working  on  the  SharedPlan  requires  that  other 
participants  assent  (either  implicitly  or  explicitly).  A  first  approximation  to  this  rule 
IS  that  if  the  participants  believe  that  one  of  them,  say  Gl,  has  a  particular  desire, 
say  to  achieve  a  state  in  which  P  holds,  and  they  are  cooperative  (in  general,  and 
with  respect  to  achieving  states  like  P  in  particular),  and  if  they  are  communicating 
about  the  desire  to  achieve  P,  then  they  mutually  believe  that  Gl  has  a  desire 
for  them  to  construct  a  SharedPlan  to  achieve  P.  A  shorthand  version  of  this  rule 
appears  in  Figure  2. 

Likewise,  if  agent  Gl  desires  that  some  action  be  performed  that  requires  G2 
doing  some  (sub) actions,  and  Gl  and  G2  are  cooperative  (in  general,  and  with 
respect  to  doing  actions  like  A  in  particular),  and  if  they  are  communicating  about 
the  desire  to  do  A,  then  they  mutually  believe  that  Gl  has  a  desire  for  them  to 
construct  a  SharedPlan  to  A.  We  will  refer  to  this  version  of  CDRl,  with  A  replacing 
P  and  Achieve(P)  as  appropriate,  as  CDRlL 

CDRl  (and  CDRl/)  establishes  the  mutual  belief  of  Gl’s  desire  for  a  SharedPlan. 
Before  it  can  be  said  that  Gl  and  G2  have  a  SharedPlan  or  even  are  forking 
on  achieving  a  SharedPlan,  it  also  must  be  the  case  that  .MB(G1,  G2,  de8ire(G2, 
achieve(SharedPlan(Gl,  G2,  Achieve(P))))).  To  establish  this  mutual  belief,  G2  has 
to  assent  either  explicitly  or  implicitly. 

When  Gl  and  G2  each  have  (and  know  the  other  has)  the  desire  to  achieve  a 
SharedPlan,  but  have  not  yet  achieved  the  SharedPlan,  they  can  be  considered  to 
have  a  partial  SharedPlan.  This  partial  plan  plays  an  important  role  in  discourse 
interpretation.  We  will  use  the  notation  SharedPlan* (Gl  G2  ACHI£VE(P))  to 
indicate  that  Gl  and  G2  have  agreed  to  work  toward  having  a  SharedPlan,  but 
have  not  yet  achieved  one.  A  partial  SharedPlan*,  like  a  SharedPlan,  is  a  collection 

‘’Agents  will  be  said  to  hsvt  schitvsd  a  SharsdPtsa  if  thsy  reach  a  itatt  in  which  they  have  the 
beliefs  and  intentions  required  for  them  to  have  a  SharedPlan. 

‘*Ia  the  remaining  rule  and  plan  ipsciflcations,  we  will  use  Achieve(P)  as  the  desired  action,  and 
not  include  the  generalisation  to  A  which  is  straightforward  to  derive. 
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of  beliefs  end  intentions.  It  may  be  partial  in  either  of  two  ways.  First,  it  may 
contain  only  some  of  the  full  collection  of  beliefs  and  intentions  of  its  associated 
full  SharedPlan.  Second,  some  of  the  beliefs  included  in  it  may  be  only  partially 
specified,  as  subsequent  examples  wUl  illustrate. 

The  existence  of  a  SharedPlan*  provides  a  crucial  element  of  the  background 
against  which  to  interpret  utterances.  In  particular  it  provides  the  basis  for  linking 
the  desire  on  the  part  of  one  agent  for  another  agent  to  act  to  the  intentions  of  the 
second  agent  to  act.  Again,  this  connection  is  not  a  hard  rule,  but  rather  reflects 
a  default  assumption  of  the  discourse  situation.  In  particular,  if  there  is  a  partial 
SharedPlan*  and  a  desire  on  the  part  of  one  agent,  say  Gl,  for  another,  say  G2,  to  do 
some  action,  and  G2  believes  he  can  perform  that  action,  and  that  by  performing 
the  action  he  will  be  contributing  to  the  achievement  of  P,  then  G2  will  (in  the 
absence  of  reasons  to  the  contrary)  adopt  an  intention  to  do  the  action.  Again  in 
shorthand,  we  have 


CDR2: 


[SharedPIan*(Gl  G2  Achieve(P)) 

Desire  (Gl,  Do(G2,  Action))  ic, 

Believe(G2,  Exec(G2,  Action))  it 
Believe(G2,  Contribute  (Action  Achieve(P)))] 
^  Intend(G2,  Action). 


This  rule  is  a  schematic.  Contribute  is  a  place  holder  for  any  relation  (e.g.,  GEN, 
ENABLE)  that  can  hold  between  actions  when  one  can  be  said  to  contribute  (e.g., 
by  generating  or  enabling)  to  the  ^performance  of  the  other. 

We  are  now  in  a  position  to  look  at  some  particular  examples  of  SharedPlans  to 
see  both  bow  the  second  clause  of  the  definition  is  fleshed  out  and  how  the  Shared* 
Plan  can  be  used  to  explain  certain  properties  of  discourse.  In  the  discussion,  we 
will  refer  to  various  utterances  providing  information  for  clauses  of  some  SharedPlan 
or  SharedPlan*.  It  is  important  not  to  confuse  such  references  with  any  notion  of 
filling  in  a  frame  for  a  SharedPlan.  A  SharedPlan  is  not  a  data  structure  (or  any 
mental  construct  analogous  to  one),  but  rather  is  a  way  for  us  to  attribute  a  certain 
collection  of  beliefs  and  intentions  to  discourse  participants.  The  participants  in 
a  discourse  mutually  believe  they  are  working  toward  establishing  the  beliefs  and 
intentions  that  are  necessary  for  one  to  say  that  they  have  a  SharedPlan.  They  also 
share  knowledge  (at  least  implicitly)  of  which  beliefs  and  intentions  are  necessary 
for  them  to  be  in  the  mental  state  corresponding  to  having  a  SharedPlan.  They 
use  the  discourse  in  part  to  establish  mutual  belief  of  the  appropriate  beliefs  and 
intentions. 

5.1  Simultaneous  Actions 

The  first  type  of  SharedPlan  to  consider  is  one  in  which  two  agents  must  act  si¬ 
multaneously  to  achieve  the  desired  state  of  affairs.  We  will  refer  to  such  plana  as 
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SharedPlanl  (Gl,  G2,  Achieve(simulUneous-re9ult)).  As  sa  example  of  simultane¬ 
ous  actions  by  difTerent  agents,  we  will  consider  the  case  of  two  agents,  Gl  and  G2, 
lifting  a  piano  together.  In  SharedPlans  of  this  type,  clause  (2)  is  of  the  form 


MB(G1.  G2,  [OCCURS  (a,-,  Gl,  Tl) 
OCCURS  {0j,  G2,  Tl) 


or,  more  succinctly. 


GEN  (^y,  7.  G2,  Tl)  k 
GEN  (a.-,  7.  Gl,  Tl)]  TO) 


.MB(G1,  G2,  GEN-Simultaneou3;(a,  k  0,),  t,  Gl<kG2,  Tl]  TO). 


For  simultaneous  actions,  it  must  be  mutually  believed  that  each  agent’s  own 
actions  will  have  the  proper  generation  relationship  with  the  desired  action  (7)  if, 
and  only  if,  the  other  agent  performs  his  actions  at  the  same  time.  Simultaneous 
actions  are  distinguished  by  the  need  for  the  time  of  performance  of  both  actions  to 
be  the  same. 

We  begin  with  a  very  simple  discourse  example.  Although  the  example  involves 
simultaneous  action  (itself  complicated),  the  discourse  includes  explicit  mention  of 
relevant  intentions  and  explicit  assent  on  the  parts  of  both  participants  to  under¬ 
taking  various  actions. 

Oiscotirte  Dl: 


1. 

SI: 

I  want  to  lift 

the  piano. 

2. 

S2: 

OK. 

3. 

I  will  pick  up 

this  [deictic 

to  keyboard]  end. 

4. 

SI: 

OK 

5. 

I  will  pick  up 

this  [deictic 

to  foot]  end. 

6. 

S2: 

OK. 

7. 

Ready? 

8. 

SI: 

Ready. 

We  will  assume  an  analysis  like  Perrault's  [this  volume]  using  defaults  for  de¬ 
termining  the  immediate  consequences  of  each  utterance.  Hence,  from  (1),  (3),  and 
(5)  respectively  the  participants  can  infer 


(1/)  MB(S1,  S2,  Deaire(Sl  lift(piano))) 

(3/)  MB(S1,S2,  INT(S2  lift(keyboard-end))) 
(5/)  MB(S1,  S2,  INT(S1  lift(foot.end))). 
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From  (!')  and  CDRl'  and  appropriate  aasumptions  about  the  agents'  coopera¬ 
tiveness,  they  can  infer  that 

MB(S1,  S2,  Desire  (SI,  Achieve  (ShartdPIanl  (SI,  S2,  lift  (piano))))) 

Hence,  following  utterance  (1),  G2  could  (coherently)  respond  in  any  one  of  the 
following  ways: 

•  explicitly  dissent  from  accepting  the  SharedPlan  (“I  can’t  help  now.*), 

•  implicitly  dissent  (“I  hurt  my  back.”), 

•  explicitly  assent  to  construct  a  SharedPlan  (above  example), 

•  implicitly  assent  to  construct  a  SharedPlan  (“Which  end  should  I  get? 

Do  you  have  a  handtruck?*). 

In  Utterance  (2),  S2  explicitly  assents  to  work  on  achieving  the  SharedPlan 
for  lifting  the  piano.  Utterance  (3),  by  providing  the  information  in  (3/),  provides 
information  needed  for  the  SharedPlan.  It  expresses  the  intentions  exhibited  in 
clauses  (3)  and  (4)  of  the  SharedPlan,  and  implicitly  expresses  S2’s  belief  that  S2 
can  execute  the  intended  action.  Si's  assent  to  this  proposed  action  in  utterance 
(4)  allows  derivation  of  mutual  belief  of  executability  as  well  as  the  relevance  of  this 
act  to  achieving  the  desired  goal  (i.e.  a  portion  of  the  belief  exhibited  in  clause  (4)). 
Utterance  (S),  analogously  to  Utterance  (3),  expresses  intentions  (now  additional 
ones)  exhibited  in  clauses  (3)  and  (4),  as  well  as  a  new  individual  belief  about 
executability.  Utterance  (6)  allows  derivation  of  mutual  belief  of  executability. 

This  discourse  does  not  include  any  explicit  mention  of  the  generation  relation¬ 
ship  exhibited  in  Clause  (2).  From  the  context  in  which  (3)  and  (5)  are  uttered, 
the  participants  can  infer  that  the  mentioned  actions  are  seen  to  participate  in  a 
generation  relationship  with  the  desired  action.  That  these  actions  together  are 
sufRcient  is  implicit  in  utterances  (7)  and  (8).  SI  and  S2  can  now  infer  that  the  gen¬ 
eration  relation  exhibited  in  Clause  (2)  holds.  Therefore  the  SharedPlan  comprises 
the  following  mutual  beliefs  and  intentions: 

SharedPlanl(Sl  S2  lift[pianoj) 

1.  MB(S1  S2  (EXEC  (lift(foot-end))  SI))  ic  (MB(Sl  S2  (EXEC  (Uft(keyboard- 
end))  S2))) 

2.  MB(S1,  S2,  GEN-simultaneous[lift(foot-end)  ic  lift(key board-end),  lift(piano), 
SI  ic  S2j) 

3.  MB(S1  S2  (INT  S2  (lift(keyboard-end))))  Sc  MB(S1  S2  (INT  SI  (lift(foot- 
end)))) 
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4.  MB(S1  S2  (INT  S2  (BY  (lift(keylx5ard-end))  lift(piano)  )))  ic  MB(S1  S2  (I.NT 
Si  (BY  (lift(foot-end))  lift(piano)  ))) 

5.  INT(S2  lifl(keyboard-end))  k  INT(Sl  lift(foot-end)) 

6.  INT(S2  (BY  (lift(key board-end))  Uft(piano)))  k  INT(S1  (BY  (Uft(foot-end)) 
lift(piano))) 

The  use  of  the  concept  of  a  SharedPlan  eliminates  the  need  for  any  notion  of  one 
agent  intending  for  another  agent  to  intend  some  action;  i.e.,  we  have  no  need  for 
clauses  of  the  form  Intend(Gl  Intend  (G2  Do  (Action))).  Rather  (as  exhibited  in 
Clause  (2)),  the  participants  must  have  mutual  belief  of  the  ways  in  which  actions  by 
each  agent  done  simultaneously  generate  a  single  (joint)  action  [namely,  lift(piano)j. 
As  stated  in  clause  (6),  S2  intends  to  lift  the  piano  by  lifting  the  keyboard-end 
(alone);  she  can  do  this  only  because  she  believes  (there  is  a  mutual  belief)  that  SI 
will  simultaneously  lift  the  foot-end. 

In  addition,  one  can  attribute  to  S2  the  intention  to  lift  the  piano  by  lifting 
the  keyboard-end  as  exhibited  in  Clause  (6).  Rather  than  rely  on  a  notion  of  "we- 
intentions*  as  does  Searle  [this  volume],  we  postulate  individual  intentions  embedded 
in  a  plan  for  joint  action.  Plans  for  joint  action  include  [mutual]  beliefs  of  the  ways 
in  which  the  actions  of  individual  agents  contribute  to  the  performance  of  a  desired 
[joint]  action  of  which  they  are  a  part. 

The  desire  to  provide  an  appropriate  account  of  imperative  utterances  (i.e.,  one 
that  did  not  depend  on  the  notion  of  one  agent  intending  for  another  agent  to  intend 
to  do  some  action)  was  a  primary  motivation  for  SharedPlans.  Hence,  we  turn  next 
to  a  variant  of  the  preceding  discourse  which  is  differentiated  by  the  use  of  an  imper¬ 
ative,  in  Utterance  (4).  Notice  that  essentially  the  same  information  about  how  to 
lift  the  piano,  and  about  intentions  to  do  various  actions,  is  conveyed  in  this  variant. 


Discourse  D2: 


1. 

SI; 

I  want  to  lift  the  piano. 

2. 

S2: 

OK. 

3. 

SI: 

I  will  pick  up  this  [deictic  to  foot]  end. 

4. 

You  get  that  [deictic  to  keyboard]  end. 

5. 

S2: 

OK. 

6. 

SI: 

Ready? 

7. 

S2; 

Ready. 

In  this  discourse,  just  as  in  Dl,  utterances  (l)-(3)  establish  that  SharedPlan*(Sl, 
S2,  lift(piano))  and  that  Si  intends  to  lift  the  foot-end  as  part  of  the  Shared- 
Plan.  The  imperative  in  utterance  (4)  conveye  that  Desire(Sl,  Do(S2,  lift-KBE)). 
Given  the  SharedPlan*,  CDR2  would  apply  if  (Believe  (S2  (EXEC  (lift  (keyboard- 
end))  S2)))  and  there  were  some  a  for  which  (Believe  S2  (GEN-simultaneousilift 
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(keybo&rd-end)  k  a,  lift(piano)l)).  In  this  case,  there  is  such  an  a,  namely  (lift(foot- 
end),  Si).  S2’s  assent  in  (5)  conveys  these  two  beliefs,  and  hence  we  can  conclude  in 
addition  that  (INT  S2  (lift(keyboard-end)))  and  (INT  S2  (BY  (lift  (keyboard-end)) 
lift  (piano))).  The  remainder  of  this  discourse  and  the  derivation  of  SharedPlanl 
goes  as  in  the  first  discourse. 

As  a  final  variant  of  the  first  discourse,  we  consider  an  example  in  which  infor¬ 
mation  conveyed  in  multiple  utterances  in  D1  and  D2  is  conveyed  in  the  utterances 
of  a  single  turn  by  one  participant.  This  single-speaker  sequence  achieves  the  same 
purposes  as  the  longer  sequence  involving  both  participants  did  in  the  previous  di¬ 
alogues. 


Discourse  D3: 

1.  SI;  I  want  to  lift  the  piano.  You  get  that  end; 

I’ll  get  this  end. 

2.  S2:  OK. 

3.  SI:  OK.  Ready,  lift. 

In  D3-1  SI  expresses  not  only  a  desire,  but  also  a  proposed  way  of  satisfying  that 
desire;  in  combination  with  CDRl  this  gives  a  proposal  for  a  shared  plan  and  also 
some  details  about  the  beliefs  and  intentions  involved.  In  particular  Utterance  (.1) 
conveys  Si’s  beliefs  about  executability,  his  intentions  to  perform  certain  actions,  his 
beliefs  about  the  role  of  these  actions  in  satisfying  the  desire  to  have  the  piano  lifted. 
In  Utterance  (2),  S2  assents  to  participating  in  the  SharedPlan,  to  the  appropriate 
mutual  beliefs  (i.e.,  those  in  Clauses  (1)  through  (4))  holding,  and  to  his  having 
the  necessary  intentions  for  Clauses  (5)  and  (6).  The  major  difference  between  this 
discourse  and  the  previous  ones  is  that  S2  does  not  get  a  chance  to  assent  to  a 
SharedPlan  until  moat  of  the  details  of  the  plan  are  formulated  and  proposed  by  Si. 
Thus,  S2’s  “OK”  in  utterance  (2)  is  assent  to  far  more  than  in  the  previous  examples. 
An  indication  of  S2’8  implicit  assent  to  the  construction  of  a  SharedPlan  comes  from 
his  not  interrupting  SI;  had  SI  not  wanted  to  participate  in  a  SharedPlan,  it  would 
be  most  natural  for  him  to  say  so  immediately. 

5.3  Conjoined  Actions 

A  similar  type  of  SharedPlan  may  be  constructed  when  the  actions  of  two  agents 
taken  together,  but  not  necessarily  simultaneously,  achieve  a  desired  result.  For 
example,  a  table  may  be  set  by  two  people  each  of  whom  performs  some  of  the 
necessary  actions  (e.g.,  one  putting  on  the  silverware,  the  other  the  plates  and 
glasses).  In  such  cases,  there  is  a  simple  conjunction  of  actions,  rather  than  a  need 
for  simultaneity.  That  is,  although  the  actions  must  all  be  performed  within  some 
time  interval,  say  T3,  they  need  not  be  performed  at  exactly  the  same  time.  For 
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this  case,  3haredP!an2(G  1 ,  G2,  .A.chievevConjoined-resuit)),  Clause  (2)  is  of  the  form 


.MB(G1,  G2,  ,OCCURS(a.,  Gl,  Tl)  <=>  GEN  (4;.  7.  G2,  T3)  k 

OCCURS(;?,,  G2.  T2)  «  GEN  (a..  7,  Gl,  T3)i,  TO) 

where  DURING(Tl,  T3)  and  DURING(T2,  T3).‘* 

Whereas  the  time  intervals  Tl  and  T2  .must  both  be  within  the  interval  T3,  they 
may  or  may  not  overlap  or  be  disjoint. 

Again,  more  briefly, 

.MB(G1,  G2,  GEN-Conjoined[(Q,  k  7,  Gl<fcG2,  T3j  TO)  . 

A  discourse  or  dialogue  for  this  variant  is  similar  to  that  of  the  simultaneous 
action;  the  main  diflPerence  is  in  exact  times  at  which  the  actions  are  done. 

5.3  Sequences  of  Actiona 

A  somewhat  more  complicated  variant  of  SharedPIan  is  one  in  which  a  sequence 
of  actions  together  generate  the  desired  action.  For  example,  turning  door  knob 
followed  by  pulling  on  the  door  ko'sb  together  (under  appropriate  conditions,  e.g., 
the  door  being  unlocked)  generate  opening  the  door. 

For  SharedPlao3(Gl,  G2,  Achieve(Sequeoce-result)),  Clause  (2)  is  of  the  form 


MB(G1,  G2,  [OCCURS(av,  Gl,  Tl)  GEN  7.  G2,  T3)  k 
OCCURS!^, ,G2,  T2)  GEN(q„  7,  Gl,  T3)],  TO) 

where  STaRTS(T1,  T3)  and  FINISHES(T2,  T3)  and  .MEETS(T1,  T2). 

Or,  more  briefly  (using  a  semicolon  to  represent  sequence,  following  its  adapta* 
tion  from  dynamic  logic  by  Roeenschein  [Roe81|) 


‘*Aa  Goldman  defined  fenentioa,  no  act  9  can  (eacrate  an  action  7  if  1  takce  Ioo(er  than  9-  Tor 
both  G£3f-Coajoined  and  CCN'Sequence,  defined  la  the  next  lectioa,  thie  condition  of  (eneratioa 
le  violated.  For  G£N-Coajoined  aad  GCN-Seqaence,  it  ie  the  cate  that  a  and  3  to(ethar  tpaa 
esaetly  the  interval  of  7,  but  it  it  not  aeceaearily  true  that  each  individually  will  do  to.  By  ueinf 
generation  for  theee  eaeee,  we  are  adapting  Goldmaa'e  definition  to  circnmetancet  of  multi-agent 
action,  a  matter  Goldman  himeeif  did  not  coneider. 


20 


Report  No.  6728 


BBN  Laboratories  Inc. 


MB(G1,  G2,  GEN-Sequence[{  a.;  7,  Gl4iG2,  T3j  TO), 

The  case  of  a  sequence  of  actions  generating  a  desired  action  is  not  the  same  as 
an  action  enabling  another  action.  Both  a  and  0  must  be  done  to  achieve  7,  and  or 
must  be  done  before  but  a  does  not  enable  In  the  door  knob  example,  turning 
the  knob  does  not  enable  pulling  on  it;  this  can  be  seen  quite  simply  by  noting  that 
one  can  also  pull  and  then  turn.  The  two  actions  together  generate  opening  the 
door. 

The  discourses  for  this  variant  may  again  be  similar  to  that  of  the  preceding 
cases;  however  the  sequencing  of  actions  must  be  made  explicit  or  already  be  mu¬ 
tually  believed. 

5.4  SharedPlans  with  a  Single  Actor 

The  final  cases  we  will  consider  are  SharedPIans  in  which  only  one  agent  actually 
performs  any  actions.  One  such  SharedPlan  is  analogous  to  Pollack’s  SimplePlans; 
the  others  are  analogous  to  the  three  cases  (simultaneous,  conjoined,  sequential 
actions)  discussed  previously.  We  will  give  the  definition  for  the  first  case;  the  others 
differ  only  in  Clause  (2);  the  appropriate  change  can  be  determined  straightforwardly 
from  the  multiagent  cases. 

A  single  agent  SharedPlan  differs  from  Pollack’s  SimplePlan  in  that  the  initial 
desire  that  leads  to  the  plan  is  one  agent’s  (say  Gl)  whereas  another  agent  (say  G2) 
acts.  For  this  case,  the  definition  is 

SharedPlan4  (Gl,  G2,  On,  Tl,  TO) 

1.  MB  [Gl,  G2,  EXEC  (a,,  G2,  Tl)  TO]  i=l,  . . n-1  i=l,  . . .,  n-1 

2.  MB  [Gl,  G2,  GEN  (cr,-,  <x,*i),  G2,  Tl)  TO]  i=l,  . . .,  n-l 

3.  MB  [Gl,  G2,  INT  (G2,  a„  Tl,  TO)  TO]  i=l,...,  n-l 

4.  .MB  [Gl,  G2,  INT  (G2,  BY  (a,-,  a,>i).  Tl,  TO)  TO]  i=l,. ...  n-l 

5.  INT  [G2,  a,,  Tl,  TO] 

6.  INT  [G2,  (BY  a,-,  a,>i),  Tl,  TO] 

Sh2iredPlan4  appears  equivalent  to  a  SimplePlan  (as  Pollack  has  defined  it)  em¬ 
bedded  in  a  mutual  belief  context,  combined  with  G2’s  in  fact  having  this  Simple¬ 
Plan,  i.e.,  SimplePIan(G2,  a„,  a,',  Tl,  TO)  [i=l,  ...  ,n-l]  and  MB[G1,  G2,  Sim- 
pIePIan(G2,  a,,  a,-,  Tl,  TO)]  [i=l,  ...  ,n-l|.  However,  this  formulation  does  not 
provide  a  basis  for  explaining  bow  to  derive  MB(G1,  G2,  INT(G2  Achieve(P))) 
from  MB(G1,  G2,  Desire(Gl  P)),  nor  how  to  subsequently  infer  that  G2  has  a 
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SimplePIao  for  achieving  P  and  that  Gl  and  G2  mutually  believe  G2  has  this  Sim- 
plePlan.  The  first  of  these  inferences  is  most  difficult,  because  it  requires  explaining 
how  the  desire  on  the  part  of  one  agent  leads  to  intentions  on  the  part  of  the  other 
agent.  The  combination  of  SharedPlans  and  the  CDRs  (along  with  rules  about  what 
agents  must  assent  to)  provides  the  needed  link. 

We  will  illustrate  the  role  of  SharedPlan4  and  the  two  CDRs  in  explaining  the 
following  dialogue  from  a  corpus  collected  by  Mann  (we  present  it  as  cited  in  hit¬ 
man’s  thesis  iLit86|); 

Discoune  D4: 

(1)  User;  could  you  mount  a  magtape  for  me? 

(2)  It's  tapel. 

(3)  .No  ring  please. 

(4)  Can  you  do  it  in  five  minutes? 

(5)  System:  We  are  not  allowed  to  mount  that  magtape. 

(6)  You’ll  have  to  talk  to  operator  about  it. 

(7)  After  nine  a.m.  monday  through  friday. 

(8)  User:  How  about  tape2? 

Rather  than  viewing  the  user’s  first  turn,  D4:(l)-(4),  as  describing  a  plan  the 
user  alone  has  for  achieving  certain  goals,  we  will  view  it  as  initiating  a  dialogue 
to  construct  a  SharedPlan  in  which  the  system  and  user  collaborate  to  satisfy  the 
user’s  desire  to  have  a  particular  tape  mounted  in  a  particular  way.  Because  in 
this  example  only  the  System  will  perform  any  physical  actions,  it  is  a  case  of 
SharedPlan4. 

We  discuss  only  the  User’s  first  turn.  As  in  the  final  piano  example,  each  of 
D4;(l)  through  (4)  provides  partial  information  about  the  SharedPlan.  Again,  the 
System’s  implicit  concurrence  (e.g.,  it  doesn’t  interrupt),  allows  the  User  to  continue 
providing  additional  information.  Utterance  (1)  proposes  a  SharedPlan*  and  sub¬ 
sequent  utterances  provide  continual  refinement  of  it.  More  particularly,  Utterance 
(1)  results  in 

MB  (User,  System,  Desire  (User,  tape-mounted  (tapeX)))  for  some  tape,  tapeX. 

From  CDRl  (and  System's  implicit  cooperativeness  for  this  specific  request)  we 
can  infer  that 

MB(User,  System,  Desire  (User,  Achieve(SharedPlan 

(User,  System,  tape-mounted  (tapeX))))). 
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From  the  lack  of  an  interruption  by  the  system,  the  user  can  infer  that  Shared- 
pian*(User,  System,  .Achieve  tape-mounted  (tapeX)J,  NOW,  t2).  To  get  from  this 
state  to  one  in  which  there  is  actually  a  SharedPlan  requires,  among  other  things, 
that  a  mutual  belief  of  the  form 

.VfB[User,  System,  INT  (System,  mount-tape(tapeX),  t2,  NOW)  NOW], 

However,  as  written  this  intention  is  not  weU-formed  because  of  the  use  of  the 
variable  tapeX.  For  the  system  to  have  an  intention  to  mount  any  tape,  it  must 
know  the  identity  of  the  tape  to  be  mounted.  The  variable  tapeX  does  not  specify 
an  individual  tape.  The  User’s  second  utterance,  D4;(2),  thus  contributes  to  con¬ 
structing  a  SharedPlan  by  establishing  the  identity  of  the  tape  to  be  mounted.  It 
is  from  this  utterance  that  the  System  can  infer  tapeX=tapel.  (We  say  more  about 
how  this  happens  in  Section  6.)  Utterance  D4:(3)  modifies  information  presented 
so  far  by  stating  that  the  desired  action  is  a  specialization  of  the  tape-mounting 
operation,  a  mounting  with  no  ring.  Finally,  utterance  (4)  sets  constraints  on  the 
time  of  execution  of  the  action  (NOW+  fewer  than  5  minutes). 

If  the  system  had  responded,  ‘'Yes,  I  will,”  to  Utterances  D4;  (1)  -  (4),  then  the 
User  and  System  would  have  succeeded  in  constructing  a  SharedPlan  comprising 
the  beliefs  and  intentions  shown  in  Figure  3  (where  tape-mounted-NR  is  true  if  the 
tape  is  mounted  with  no  ring.)  CDR2  is  essential  to  the  derivation  of  (3)  -  (6)  of 
this  SharedPlan. 

To  explain  the  System’s  actual  response,  it  is  necessary  to  consider  the  state  of 
the  developing  SharedPlan  just  prior  to  Utterance  (5).  At  this  point,  the  User  has 
made  public  a  set  of  beliefs  he  holds  about  tape-mounting  actions,  about  relations 
among  them,  and  about  intentions  that  he  desires  the  System  to  have;  the  System 
IS  aware  of  these  beliefs.  With  Utterance  (5),  the  System  establishes  that  the  User’s 
proposed  SharedPlan  cannot  be  constructed.  In  particular,  the  System  makes  it 
clear  that  NOT  [EXEC  (mount-tape-NR(tapel),  System,  NOW-t-5min)]i  Subse¬ 
quent  utterances  provide  an  alternative  proposal  for  satisfying  the  User’s  original 
desire. 

6  Feedforward  and  B4ckward 

The  previous  section  describes  how  Utterance  (2)  contributes  to  constructing  the 
SharedPlan.  However,  it  is  also  the  case  that  the  SharedPlan  provides  a  context  in 
which  to  interpret  Utterance  (2).  The  ways  in  which  ieformation  Sows  both  for¬ 
ward  and  backward  in  this  discourse  can  beat  be  seen  by  adopting  an  action-oriented 

"W«  prMumt  the  indirect  interpretatioa  of  (1).  The  direct  interpretation  of  (1),  Le.  querying 
whether  EX£C(mount-tape( tapeX),  Syetem,  C2),  would  lead  to  another  SharedPlan.  One  aight 
argue  that  it  ii  only  with  Utterance  (2)  that  we  are  eure  about  the  indirection. 
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A 


1  .VlB[User,  System.  EXEC(mount-tape-NR(tapel), 

'  System,  NOW-Smio)  .NOW] 

2.  MBIUser,  System,  GEN(mount-Upe-NR(tapel), 
Achieve(tape*mounte<l-NR(tapel)),  System,  NOW-t-5nun),  NOW] 

3.  MB(User,  System,  INT(System,  mouat-tape-NR(tapel), 
NOW+Smin,  NOW)  NOW) 

4.  .MBfUser,  System,  INT(System,  by(mount-tape-.NR(tapel), 
.\chieve(tape-mountecl-NR(tapel))),NO W— 5mm,  .NOW)  .NOW) 

5.  INT(Sy8tem,  mount-tape-NR(tapel),  NOW— 5min,  NOW) 

6.  INT{System,  by(mount-tape-NR(tapel), 
Achieve(tape-mounted-.NR(tapel))),NO W-*-5rrin,  NOW) 


Figure  3:  SharedPlan  for  .Vlounting  Tapel 

stance  towards  utterances.  In  particular,  an  utterance  itself  is  an  action  which  can 
generate  and  enable  other  actions.  From  thiS  perspective  Utterances  (1)  •  (4)  may 
be  seen  to  have  among  their  effects  the  esublishment  of  the  SharedPlan.  We  want 
to  look  briefl/  at  the  more  local  utterance  to  utterance  effects  and  their  interactions. 
By  uttering,  "Could  you  mount  a  magtape  for  me?'  the  user  generates: 

Achieve(MB  [User,  System, 

Desire  (User,  Informif  (System,  Uw.-r, 

EXEC  (System,  mount-tape(tapeX)) 

for  some  tapeX  s.t.  magtape  (tapeX)))])  which  in  turn  generates 


.Achieve  (MB  (User,  System, 

Exists  (tapeX,  msgtape( tapeX)  ic 
Desire  (User,  tap»>mounted(tapeX)))|). 


.  ^ 
.V 


Another  effect  of  the  first  utterance  is  to  create  a  discourse  entity,  in  this  case 
tapeX.  Under  the  condition  that  there  is  some  discourse  entity  which  is  a  tape,  say 
tapeZ,  any  utterance  of  "It's  tapel”  conditionally  generates  (as  defined  by  Gold¬ 
man  [op.citj  and  Pollack  I'op.cit.j)  Achieve(MB{User,  System,  tapeZ=tapel|).  In 

‘^We  art  iktteiiy  about  tht  indirectnttt  here  because  that  is  not  our  main  point. 
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:h;*  discourse,  the  user's  first  utterance  provides  the  discourse  entity  that  satisfies 
•nis  condition,  namely  tapeX.  Thus  Utterance  (1)  enables  Utterance  (2)  to  generate 
Achievei  MBiUser,  System,  tapeX=tapeli).  Thus  we  see  the  first  utterance  as  feed¬ 
ing  forward  a  discourse  entity  and  the  second  utterance  feeding  back  (to  the  partial 
Shared? Ian)  information  to  fiesh  out  the  plan. 

F.nally  we  might  note  that  this  treatment  of  action  descriptions  parallels  previ- 
oui  observations  about  object  descriptions.  The  way  in  which  the  utterances  D4;(l)- 
4  pro  ie  increasingly  more  unformation  about  the  particular  tape-mounting  action 
-.r.e  jsei  wishes  the  system  to  undertake  is  similar  to  the  use  of  multiple  utterances 
.0  provide  additional  information  about  some  object.  For  example,  I  might  describe 
s  particular  book  to  you  as,  "The  book  is  on  the  coffee  table.  It’s  Percy’s  The 
Kitisage  m  the  Bottle.  Bright  orange  cover  and  silver  letters." 

7  Further  Work 

The  notion  of  SharedPIan  was  developed  both  to  help  explain  the  collaborative  type 
of  plans  that  seem  to  underlie  discourse  and  to  provide  the  basis  for  recognition  of 
intention  at  the  discourse  (as  opposed  to  utterance)  level.  Further  exploration  of 
•.his  notion  requires  fundamental  research  in  two  areas:  (1)  specification  of  relations 
between  actions  that  are  more  complex  than  generation  (e.g.,  enabling  relations), 
and  their  role  in  SharedPlans  of  various  sorts;  (2)  examination  of  the  details  of  the 
recognition  process  (e.g.,  recognition  algorithms  for  beliefs  and  intentions  that  must 
be  shared  for  there  to  be  a  SharedPIan). 

.\s  Pollack  has  pointed  out,  the  enabling  relationship  and  the  way  it  enters  a 
plan  introduces  a  number  of  complexities  into  the  plan  formalization  and  recognition 
process.  Although,  a  detailed  treatment  of  enabling  relationships  awaiu  further 
research,  we  can  use  a  simple  example  to  illustrate  how  enabling  relations  would  fit 
with  SharedPlans. 

Consider  the  utterance,  “Please  pass  the  butter."  in  the  context  of  the  speaker’s 
eating  dinner  with  the  hearer,  and  the  dinner  including  corn  on  the  cob  (and  nothing 
else  butterable).  Figure  4  shows  the  action  decomposition  relevant  to  this  utterance 
and  the  buttering  of  a  cob  of  corn.  In  place  of  the  generation  relation  that  is  used 
in  plan  definitions  for  Pollack’s  SimplePlan  and  the  SharedPlans  presented  in  this 
paper,  the  plan  sketched  in  this  figure  requires  more  complex  action  relationships. 
A  portion  of  this  decomposition  will  form  the  core  of  the  beliefs  of  a  SharedPIan 
that  results  in  satisfying  a  condition  on  a  private  SimplePlan.  The  SharedEnable- 
Plan(S,  H,  Acfaieve(Have-Butter))  satisfies  the  condition  (Have-Butter)  needed  for 
S’s  SimplePlan  of  SimplePlan(...Achieve(buttered-com)). 

Prior  work  on  plan  formalisms  and  plan  recognition  used  a  notion  of  subactions 
or  step  decomposition  to  capture  some  of  the  relationships  we  have  portrayed  here. 
However,  as  the  example  about  door-opening  in  Section  5.3  illustrates,  step  decom¬ 
position  is  used  ambiguously  to  refer  to  generation  relations,  enabling  relations,  and 
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sequencing  "elationt  among  actions. 

The  recognition  process  for  SharedPlans  as  sketched  in  this  paper  proceeds  es¬ 
sentially  as  follows:  the  initial  utterances  put  on  the  table  a  proposal  that  there 
be  a  shared  plan  developed  and  carried  out  to  satisfy  the  initiating  conversational 
participant’s  desire;  the  subsequent  utterance  must  somehow  address  this  proposal, 
either  accepting  or  denying  it;  assuming  the  proposal  is  accepted,  subsequent  utter¬ 
ances  can  provide  information  about  any  of  the  beliefs  or  intentions  embedded  in 
the  definition  of  a  SharedPIan.  This  process  diflfers  significantly  from  prior  work  on 
recognition  in  that  it  does  not  presume  a  fixed  plan  on  the  part  of  one  participant 
the  form  and  content  of  which  must  be  inferred  by  the  other(s).  Instead,  collab¬ 
orative  planning  entails  a  negotiation  in  which  information  about  actions,  action 
relationships,  desires,  and  intentions  are  made  sufficiently  clear  for  all  participants 
to  know  how  actions  will  be  used  to  satisfy  desires.  Plan  recognition  is  then  the 
determination  of  these  beliefs  and  intentions. 
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